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Abstract. Computing an optimal chain of fragments is a classical prob¬ 
lem in string algorithms, with important applications in computational 
biology. There exist two efficient dynamic programming algorithms solv¬ 
ing this problem, based on different principles. In the present note, we 
show how it is possible to combine the principles of two of these algo¬ 
rithms in order to design a hybrid dynamic programming algorithm that 
combines the advantages of both algorithms. 


1 Introduction 


The need for very efficient pairwise sequence alignments algorithm has moti¬ 
vated the development of methods aimed at breaking the natural quadratic time 
complexity barrier of dynamic programming alignment algorithms [S]. One of 
the successful alternative approaches is based on the technique of chaining frag¬ 
ments. Its principle is to first detect and score highly conserved factors, the frag¬ 
ments (also called anchors or seeds), then to compute a maximal score subset of 
fragments that are colinear and non-overlapping in both considered sequences, 
called an optimal chain. This optimal chain is then used as the backbone of an 
alignment, that is completed in a final stage by aligning the gaps located between 
consecutive selected fragments. This approach is used in several computational 
biology applications, such as whole genome comparison |18lll7j . cDNA/EST 
mapping HU, or identifying regions with conserved synteny. 

In the present work we are interested in the problem of computing an op¬ 
timal chain of fragment^ from a given set of k fragments, for two sequences 
t and u of respective lengths n and m. Due to its applications, especially in 
computational biology, this problem has received a lot attention from the al¬ 
gorithmic community |3l4l8l9ll0lllllj . The fragment chaining problem can be 
solved in 0{k -|- n x m) time by using a simple dynamic programming (DP) 
algorithm (see [3] for example). However, in practical applications, the number 
k of fragments can be subquadratic, which motivated the design of algorithms 
whose complexity depends only of k and can run in O(fclogfc) worst-case time 


We focus here on the problem of computing the score of an optimal chain, but our 
algorithm can be complemented by a standard backtracking procedure to compute 
an actual optimal chain. 






(see |8| 4|10 | 12j ). The later algorithms, known as Line Sweep (LS) algorithms, 
rely on geometric properties of the problem, where fragments can be seen as 
rectangles in the quarter plane, and geometric data structures that allow to re¬ 
trieve and update efficiently {i.e. in logarithmic time) optimal subchains (see [12] 
for example). 

This raises the natural question of deciding which algorithm to use to when 
comparing two sequences t and u. In particular, it can happen that the density 
of fragments differs depending on the location of the fragments in the considered 
sequences, due for example for the presence in repeats. In such cases, it might 
then be more efficient to rely on the DP algorithm in regions with high fragment 
density, while in regions of lower fragment density, the LS algorithm would be 
more efficient. This motivates the theoretical question we consider, that asks to 
design an efficient algorithm that relies on the classical DP principle when the 
density of fragments is high and switches to the LS principle when processing 
parts of the sequences with a low density of fragments. We show that this can 
be achieved, and we describe such a hybrid DP/LS algorithm for computing the 
score of an optimal chain of fragments between two sequences. We prove that 
our algorithm achieve a theoretical complexity that is as good as both the DP 
and LS algorithm, i.e. that for any instance, our algorithm performs as at least 
as well, in terms of theoretical worst-case asymptotic time complexity, as both 
the DP and the LS algorithm. 

In Section we introduce formally the fragment chaining problem and the 
DP and LS algorithms. In Section we describe our hybrid algorithm and 
analyze its complexity. 

2 Preliminaries 

Preliminary definitions and problem statement. Let t and u be two sequences, of 
respective lengths n and m. We assume that positions index in sequences start 
at 0, so t[0] is the first symbol in t and t[n—\] its last symbol. As usual, by t[i,j\ 
we denote the substring of t composed of symbols in positions i,i + 1,... ,j. 

A fragment is a factor that is common, possibly up to small variations, to 
t and u. Formally, a fragment s is dehned by 5 elements {s.£, s.r, s.t, s.b., s.s): 
the first four fields indicate that the corresponding substrings are t[s.£, s.r] and 
u[s.b, s.f], while the field s.s is a score associated to the fragment. We call borders 
of s the coordinates (s.£,s.b) and (s.r, s.t). As usual in chaining problems, we 
see fragments as rectangles in the quarter plane, where the a;-axis corresponds 
to t and the y-eods to u. For a fragment s, s.i,s.r,s.b and s.t denote the left and 
right position of s over t and the bottom and top position of s over u {s.£ < s.r 
and s.b < s.t). See hgure[^for an example. 

Let S denote a set of k fragments for t and u. A chain is a set of fragments 
{si,..., Si} such that s^.r < Si+i.£ and Si.t < Si+i.b for i = 1,...,— 1; the 
score of a chain is the sum fragments it contains. A chain is 

optimal if there is no chain with a higher score. The problem we consider in the 
present work is to compute the score of an optimal chain. 





Fig. 1. Example of the fragment chaining problem with three fragments represented 
by squares. Possible chains are [(s), {s'), {s"), {s, s"), {s', s”)]. The best chain is {s, s"), 
with a score of 7. 


The dynamic programming (DP) algorithm. We first present a simple dynamic 
programming (DP) algorithm that computes an n x m dynamic programming 
table M such that is the score of an optimal chain for the prefixes 

and u[0, j] (See pseudo-code [^. We present here a version that does not instan¬ 
tiate the full n X m DP table, but records only the last filled column, following 
the classical technique introduced in [6] and used in the space-efficient fragment 
chaining DP algorithm described in [^. 

The difference with Morgenstern’s space efficient DP algorithm |9] is that we 
still require a quadratic space for the data structure L. In terms of computing 
the score of an optimal chain, the key point is that S'[s], if defined, contains the 
optimal score of a chain that contains s as last fragment. The worst-case time 
complexity of this algorithm is obviously 0{k + n x m). 

The Line Sweep (LS) algorithm. We now describe a Line Sweep algorithm for 
the fragment chaining problem (See pseudo-code |^. The main idea is to process 
fragments according to their order in the sequence t, while maintaining a data 
structure that records, for each position i in u, the best partial chain found so 
far using only fragments below position i. 

In this algorithm P stores all fragments borders, iS'[s], as in the DP algorithm, 
is the score of an optimal chain among all the chains that end with fragment 
s. A fragment s is said to have been processed after the entry {s.r, end, s.s) has 
been processed through the loop in line A partial chain is a chain composed 
only of processed fragments. 

The data structure A satisfies the following invariant, that is key to ensure 
the correctness of the algorithm: if {pos,type, s) is the last entry of P that has 
been processed, then A contains an entry {p, v) if and only if the best chaining 
score, among partial chains that belong to the rectangle defined by points (0,0) 
and {pos,p), is v and corresponds to a chain ending with a fragment s' such that 
s'.t = p. 

Line [T^ ensures this invariant is maintained. This invariant allows to retrieve 
from A the score of an optimal partial chain that can be extended by the current 













Algorithm 1 The Dynamic Programming algorithm 

1 L: an array of n x m linked lists 

2 5: an array of k integers 

3 M: an array of m integers 

4 foreach s in <S do 

5 front insert {s,end) into L[s.r][s.t] 

6 front insert {s, begin) into L[s.€][s.6] 

7 for i from 0 to n 

8 left = 0 

9 leftDown = 0 

10 for j from 0 to m 

11 maxC = 0 

12 foreach (s,type) in 

13 if type is begin 

14 S[s] = s.s + leftDown 

15 if type is end and ^[s] > maxC 

16 maxC = S[s] 

17 leftDown = left 

18 left = M[j] 

19 M[j] = max{M\j], M[j — l],maxC) 

20 return M[m — 1] 


Algorithm 2 The Line Sweep algorithm 

1 P-. an array of 2k triples {position, type, fragment) 

2 A: a. set of pairs {position, score) 

3 S': an array of k integers 

4 foreach s in S do 

5 insert {s.£, begin, s) into P 

6 insert (s.r, end, s) into P 

7 sort P according to the field position, with begin positions appearing before end 

positions having the same valne 

8 foreach {pos, type, s) in P 

9 if type is begin 

10 retrieve from A the pair {p, v) such that p is the highest position strictly 

less than s.b 

11 S[s] = s.s + w 

12 if type is end 

13 {p, v) = retrieve from A the highest position less or equal to s.t 

14 if S[s] > V 

15 retrieve from A the pair {p',v') such that v' is the highest score less 

than or equal to S'fs] 

16 remove from A all entries {p”, v") such that p < p" < p' 

17 insert {s.t, ^[s]) into A 

18 {p, v) = last entry of A 

19 return v 




fragment s, i.e. that ends up in it in a position strictly smaller than s.b (line 


11). This property follows from the fact that the order in which fragments are 


processed ensures that all previously processed fragments do not overlap with 
the current fragment in t. 

In order to implement this algorithm efficiently, it is fundamental to ensure 
that in line 16 the time required to remove c entries (the set of all entries of A 
with first field strictly greater than p and lower than or equal to p') is 0(clog(fc)). 
If A is implemented in a data structure that satisfies this property and support 
searches, insertions and deletions in logarithmic time, then the time complexity 
of the algorithm is 0(fclog(fc)); see [H] for a discussion on such data structures. 


3 An hybrid algorithm 

We now describe an algorithm that combines both approaches described in the 
previous section. 

Overview. We first introduce the notion of compact instance. An instance of the 
chaining problem is said to be compact, if each position of t and each position 
of u contains at least one border. If an instance is not compact, then there 
exists a unique compact instance obtained by removing from t and from u all 
positions that do not contain a fragment border, leading to sequences t' and 
u', and updating the fragments borders according to the sequences t' and u', 
leading to a set S' of fragments. From now, we denote by (t', u'. S') the compact 
instance corresponding to (t,u,S), and m' and n' the lengths of t' and u'. 

Next, we define, for a position p of t its border density ICp as the number of 
fragment borders (i.e. number of fragments extremities) located in t[p]. If is 
the set of positions in t' with border density strictly greater than , and 

the remaining n' — |P^| positions of t', then the hybrid DP/LS algorithm we 
describe below has time complexity 


O fc + min(fclog(A:), m) + min(/clog(fc), n) + ^ (m' + ICp) + log(m') ^ Kp 

Intuitively, our hybrid algorithm works on a compact instance, and fills in the 
DP table for this compact instance, deciding for each column of this table {i.e. 
position of t') to fill it in using the DP equations or the Line Sweep principle, 
based on its border density. 

Compacting an instance. We first describe how to compute the compact instance 
{t',u',S'). 

Lemma 1. The compact instance {t',u',S') can be computed in time 
O {k + min(fc log(fc), m) + min(/c log(fc), n)) and space 0{k + n + m). 






The proof of this lemma is quite straightforward, and we omit the details 
here for space reason. Assume we are dealing with t (the same method applies 
to u). 

— If klog{k) < m, then we (1) sort the fragments extremities in t in increasing 
order of their starting position, (2) cluster together fragment extremities with 
the same value, and (3) relabel the coordinates of each fragment extremity 
using the number of clusters preceding it in the order, plus one. 

— If fc log k > m, then we (1) detect positions of t with no fragment extremities, 
in 0{k+m) time, (2) mark them and relabel the positions with non-zero den¬ 
sity in 0{m) time, and finally (3) relabel the fragment extremities according 
to the new labels of their positions, in 0{k) time. 

From now, we assume that the compact instance has been computed and 
that it is the considered instance. 

DP update vs LS update. In this section, we introduce our main idea. The prin¬ 
ciple is to consider fragments in the same order than in the LS algorithm - i.e. 
through a loop through indices 0 to n' — I, a feature which is common to both 
the DP and LS algorithms -, but to process the fragments whose border in t' is 
in position i using either the DP approach if the density of fragments at t'[i] is 
high, or the LS approach otherwise. Hence, the key requirement will be that, 

— when using the DP approach, the previous column of the DP table is avail¬ 
able, 

— when using the LS approach, a data structure with similar properties than 
data structure A used in the LS algorithm is available. 

A hybrid data structure. We introduce now a data structure B that ensures that 
the above requirements are satisfied. The data structure B is essentially an array 
of m! entries augmented with a balanced binary search tree. Formally: 

— We consider an array B of m' entries, such that B[i\ contains chaining scores, 
and satisfies the following invariant: if s is the last processed fragment, for 
every i = 1,..., s.r, B[i\ > B[i — I]. 

— We augment this array with a balanced binary search tree C whose leaves 
are the entries of B and whose internal nodes are labeled in order to satisfy 
the following invariant: a node x is labeled by the maximum of the labels of 
its right child and left child. 

The data structure B will be used in a similar way than the data structure 
A of the LS algorithm, i.e. to answer the following queries: given 0 < p < m', 
find the optimal score of a partial chain whose last fragment s satisfies s.t < p. 
This principle is very similar to solutions recently proposed for handling dynamic 
minimum range query requests [5]. 

We describe now how we implement this data structure using an array. Let 
h be the smallest integer such that m' <2^. We encode B into an array of size 
2^+^, whose prefix of length to' — 1 contains the labels of the internal nodes of 


the binary tree C (so each cell contains a label and the indexes to two other cells, 
corresponding respectively to the left child and right child), ordered in breadth- 
first order, while the entries of B are stored in the suffix of length m! of the array 
(see figure]^. From now, we identify nodes of the binary tree and cells of the 
array, that we denote by B. 
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Fig. 2. Example of the implementation of the data structure B with an array. 


Using this implementation, for a given node of the binary search tree, say 
encoded by the cell in position x in B (called node x from now), we can quickly 
obtain the position, in the array, of its left child, of its right child, but also of its 
parent (if B[x] is not the root) and of its rightmost descendant, defined as the 
unique node reached by a maximal path of edges to right children, starting at x 
edges to a left (resp. right) child. Indeed, it is straightforward to verify that, the 
constraint of ordering the nodes of the binary tree in the array according to a 
breadth-first order implies that, for node x, if y is the largest integer such that 
2^ < a; -I- 1 and z = x — 2^ + 1, then: 

— if a: > 2^ — 1, a; is a leaf; 

— leftChild{x) = 2*^+^ — l-|-2*zifa;is not a leaf; 

— rightChild{x) = 2*^+^ — l-|-2*z-|-lifa:is not a leaf; 

— parent{x) = — 1 if x = 0 (a: is the root); 

— parent{x) = 2*^“^ — 1 -|- ||| if x 0; 

— rightmostChildix) =2^ — \ + {z + 1)2^“^ — 1. 

Implementing the DP and LS algorithms with the hybrid data structure. It is 
then easy to implement the DP algorithm using the data structure B, by using 
B as the current column of the DP table {i.e. if the currently processed position 
of t' is i, B[j] is the score of the best partial chain included in the rectangle 
defined by (0,0) and (i,j)), without updating the internal nodes of the binary 
search tree C. 

To implement the LS algorithm, the key points are 

— to be able to update efficiently the data structure B, when a fragment s has 
been processed; 

— to be able to find the best score of a partial chain ending up at a position in 
u' strictly below p. 


















Algorithm 3 Set a chaining score for a position p. 

1 setScore{B,p, score) : 

2 index = 2^ — 1 + p // start from leaf corresponding to p 

3 while index] = —1 && B[index] < score 

4 B [index] = score 

5 index = parent{index) 


Algorithm 4 Retrieve the best chaining score for partial chains ending strictly 
below position p. 

1 getBestScore{B,p) : 

2 let b be the smallest integer s.t. m' < 2*’ 

3 maxScore = 0 

4 currentNode = 0 // the root node 

5 indexOfP = 2^ — 1 + p 

6 while rightmostChild{currentNode) > indexOfP 

7 left = leftChild{currentNode) 

8 if rightmostChildileft) >= indexOfP // move left 

9 ncurrentN ode = left 

10 else // move right 

11 maxScore = max{maxScore, B[left]) 

12 currentN ode = rightChild{currentN ode) 

13 return majcirnaxScore, B[currentNode]) 




Updating B can be done through the function setScore below, with param¬ 
eters p = s.t and score = 5'[s], while the second task can be achieved by the 
function getBestScore described below, which is a simple binary tree search. 

It is straightforward to see that if all updates of B are done using the function 
setScore, then the two required invariants on B are satisfied. The time complex¬ 
ity of both setScore and getBestScore is in 0(log(TO')), due to the fact that the 
binary tree is balanced. So now, we can implement the LS algorithm on compact 
instances using the data structure B by replacing the instruction in line E] of 
the LS algorithm by a call to getBestScore{B, s.b), the block of instructions in 
lines T3p7 by setScore{B, S'[s]) and reading the optimal chain score in the root 
of the binary tree. The complexity of operations over B are logarithmic in m' 
that is less or equal to k. Thus the overall time complexity is in O(fclogm'). 


LS/DP update with the hybrid data structure. So, in an hybrid algorithm that 
relies on the data structure B, when the algorithm switches approaches (from 
DP to LS, or LS to DP), the data structure B is assumed to be consistent for 
the current approach, and needs to be updated to become consistent for the next 
approach. 

So when switching from DP (say position i — 1, i = 1,... n') to LS (position 
i), we assume that B[j] {j = 0,..., to' — 1) is the optimal score of a partial chain 
in the rectangle defined by (0,0) and (i— 1, j), and we want to update B in such 
a way that the label of any internal node x of the binary tree is the maximum 
of both its children. As B are the leaves of the binary tree, this update can be 
done during a post-order traversal of the binary tree, so in time 0{m!). 

When switching from LS to DP (say to use the DP approach on position i 
while the LS approach was used on position * — 1), we assume that for every leaf 
B[j] of the binary tree corresponding to a position at most i — 1, the value in 
B[j] is the optimal score of a partial chain in the whose last fragment ends in 
position i — 1; this follows immediately from the way labels of the leaves of the 
binary tree are inserted by the setScore function. To update B, we want that in 
fact B[j] is the optimal score of a partial chain in the whose last fragment ends 
in position at most i — 1. So the update function needs only to give to B[j] the 
value maxo<j'<j B[j'], which can again be done in time 0{m'). 

So updating the data structure B from DP to LS or LS to DP can be done 
in time 0(m'). We denote by update the function performing this update. 


Deciding between LS and DP using the fragment density. Before we can finally 
introduce our algorithm, we need to address the key point of how to decide which 
paradigm (DP or LS) to use when processing the fragments having a border in 
the current position of t, say c. Let /Cc be the number of fragments s such that 
s.£ = c or s.r = c. Using the DP approach, the cost of updating B [i.e. to 
compute the column c of the DP table) is 0(rn! -|- Kf). With the LS approach, 
the cost of updating B is in 0{lCc\ogm'). 

So, if JCc > logm'-i ’ asymptotic cost of the DP approach is better than 
the asymptotic cost of the LS approach, while it is the converse if JCc < logm'-i • 





So, prior to processing fragments, for each position i'vat (i = 0,..., m' — 1), we 
record in an array C is fragments borders in position i are processed using the 
DP approach {C[i] contains DP) or the LS approach {C[i\ contains LS). This 
last observation leads to our main result, Algorithm below. 


Algorithm 5 A hybrid algorithm for the fragment chaining problem. 

1 compute the compact instance {t',u',S') 

2 Z/1: an array of n' x 2 linked lists 

3 C: an binary array of size n' 

4 foreach s in S' do 

5 if C[s.r] is DP then front insert {s, end, s.t) into I/l[s.r][l] 

6 else front insert {s,end) into Ll[s.r][0] 

7 if C[s.£\ is DP then front insert {s, begin, s.b) into I/l[s.f][l] 

8 else front insert (s, begin) into Ll[s.f][0] 

9 B: a, binary tree for m' leafs (all nodes are set to zero) 

10 B: refers to the m' leaves of B 

11 S': an array of integer of size k 

12 for i from 0 to n' do 

13 if C[i] ^ C[i — 1] then update{B) 

14 if C[i\ is DP 

15 L2-. an array of m! linked lists 

16 for each {s,t,j) in Ll[i][l] do front insert {s,t) into L2[j] 

17 left = 0, leftDown = 0 

18 for j from 0 to m! do 

19 maxC = 0 

20 foreach {s,type) in L2[j] do 

21 if type is begin then S[s] = s.s + leftDown 

22 if type is end and S[s] > maxC then maxC = S[s] 

23 leftDown = left, left = B[j] 

24 B[j] = max{B[j], B\j — l],maxC) 

25 else // C[i\ is LS 

26 foreach {s,type) in Ll[i][0] do 

27 if type is begin then S[s] = s.s + getBestScore{B, s.b) 

28 if type is end then setScore{B, s.t, S[s]) 

29 if C[n' — 1] is direct then return B[m' — 1] 

30 else return value of the root of B 


Time and space complexity. In terms of space complexity, the algorithm, we 
avoid to use 0{k + n' xm') space for storing the fragments borders in n' x m' lists 
(structure L of the DP algorithm) by using two lists: L\[i] stores all fragments 
borders in position i oi t', while L2[j] stores all fragments borders in position i 
of t' and j of u', and is computed from A[l]. So the total space requirement is 
in 0{k + m' + n'). 





We now establish the time complexity of this algorithm. If the current po¬ 
sition i of t is tagged as DP, the cost for updating the column is 0{m' + ICi), 
including the cost of setting up L2 from LI, that is proportional to the number 
of fragments borders in the current position (Iine [l4}p4| ). If C[i\ is LS, the cost 
for computing chains scores on this position is 0{lCi\ogm') (line 25- 28). Thus, 
if we call the set of positions on t where we use the DP approach,!^ the set 
of positions on t where we use the LS approach and P = U P^, the time for 
the whole loop at line is 


O I {m' + ICp) + ICp log 1 

We have \P^\ + \P^\ = n', Wp € P^ : ICp > and Wp € P^ : ICp < 

Moreover, updating the data structure B from LS to DP or DP to LS (line 131 


is done at most one more time then the size of P^, so the total cost of this 
operation is O (^peP^ can thus be integrated, asymptotically, to the 

cost of processing the positions in P^. 


Theorem 1. The hybrid algorithm computes an optimal chain score in time 


O \k min(A: log fc, m) -I- min(fc log k, n) -|- ^ [m' JCp) -|- log m' ^ K,p 

( 1 ) 

and space 0{k -\- n + m). 

To conclude the complexity analysis, we show that the hybrid algorithm 
performs at least as well, asymptotically, than both the DP and the LS algo¬ 
rithms. From Q, we deduce that, if P^ = P, the hybrid algorithm time com¬ 
plexity becomes 0{k -I- min(fc log A:, m) -I- min(fc log A:, n) -I- logm'A:), which is at 
worst equal to the asymptotic worst-case time complexity of the LS algorithm 
as m' = min(TO, k). 

Now, if P^ 7 ^ P, for every position c in we know that the cost of updating 
B and processing c with the DP approach is not worse than processing it with 
the LS approach, by the value chosen for ICc- This ensures that, asymptotically, 
the hybrid algorithm does perform at least as well as the LS algorithm. 

We consider now the DP algorithm. Again, from Q, if = P, the com¬ 
plexity becomes 0(A;-|-min(A; log k, to) -|-min(A; log k, n) +m'n'), which is equal to 
the original dynamic programming algorithm time complexity as n' = min(n, k) 
and to' = min(TO, k). 

As above, if we assume now that P^ ^ P, then we know that the cost of 
processing the positions of P^ with the LS approach is asymptotically not worse 
than processing them with the DP algorithm. The cost of updating B from 
switching from DP to LS can be integrated into the asymptotic cost of the DP 
part. This shows that the hybrid algorithm is, asymptotically, not worse than 
the pure DP algorithm. 








4 Discussion 


Our main result in the present paper is an hybrid algorithm that combines the 
positive features of both the classical dynamic programming and of the line 
sweep algorithm for the fragment chaining problem. We did show that a simple 
data structure can be used to alternate between both algorithmic principles, 
thus benefiting of the positive behavior of both algorithms. Not surprisingly, the 
choice between using the DP or the LS principle is based on fragments density. 

It is easy to define instances where the hybrid algorithm performs better , 

4 

asymptotically, than both the DP and LS algorithms. For example, if to = ns 
and fc = 2 n 2 and there are ns seeds extremities on t[0] and ns extremities on 
t[n — 1], all other extremities spread along t and u, we can show that the com¬ 
plexities are Oin^) for the DP algorithm, 0{n^ logn) for the LS algorithm and 
0{n^) for the hybrid algorithm. However, so far our result is mostly theoretical. 
The threshold of TO'/(log(TO') — 1) considered on real genome data is high, as it 
assumes a very high vfragment density that is unlikely to be observed often, at 
least on applications such the alignment of whole bacterial genomes for exam¬ 
ple. Preliminary experiments on such dfatya following tShe approach developped 
in [T3] show that the LS algorithm is slightly more efficient than the hybrid one. 
So it remains to be seen if it could result in an effective speed-up when chaining 
fragments in actual biological applications, especially involving high-throughput 
sequencing data or overlapping fragments [13]. From a practical point of view, 
it is also of interest to consider algorithm engineering apsects, especially related 
to the hybrid data structure, to see if this could alleviate the issue of the high 
density threshold required to switch between the LS and DP approaches, and 
assess the practical interest of the novel theoretical framework we introduced in 
the present paper. 
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